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A characterization of the scientific impact of Brazilian institutions 
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In this paper we studied the research activity of Brazilian Institutions for all sciences and also their perfor- 
mance in the area of physics between 1945 and December 2008. All the data come from the Web of Science 
database for this period. The analysis of the experimental data shows that, within a nonextensive thermostatis- 
tical formalism, the Tsallis ^-exponential distribution N(c) can constitute a new characterization of the research 

impact for Brazilian Institutions. The data examined in the present survey can be fitted successfully by applying 

i 

a universal curve namely, N(c) <x l/[l + (q — 1) c/T] 9-' with q ~ 4/3 for all the available citations c, T being an 
"effective temperature". The present analysis ultimately suggests that via the "effective temperature" T, we can 
provide a new performance metric for the impact level of the research activity in Brazil, taking into account the 
number of the publications and their citations. This new performance metric takes into account the "quantity" 
(number of publications) and the "quality" (number of citations) for different Brazilian Institutions. In addition 
we analyzed the research performance of Brazil to show how the scientific research activity changes with time, 
for instance between 1945 to 1985, then during the period 1986-1990, 1991-1995, and so on until the present. 
Finally, this work intends to show a new methodology that can be used to analyze and compare institutions 
within a given country. 
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1. INTRODUCTION 

The analysis of the citations of scientific papers is an im- 
portant issue that can enable a better understanding of the re- 
search activity of the authors and the institutions HHH]. The 
evaluation of the productivity of individual scientists has tra- 
ditionally relied on the number of papers they have published. 
It is becoming popular to use citation analysis as a bibliomet- 
ric tool for the evaluation of the scientific and academic per- 
formance for individual researchers [1], journals 14J,|5|], uni- 
versities J2,0] even entire countries 0]. Nowadays, with the 
easy access to the Internet and to large databases, including 
the Web of Science JH, the comparison of the impact of sci- 
entific contributions is a much easier and more rapid process. 

Research productivity is usually measured by taking into 
account two different variables, namely the number of total 
publications and their citations. The first measure reflects re- 
search quantity and the other reflects research impact. The 
degree to which published works are cited by other authors 
is generally considered as a reflection of the quality of those 
works [12]. Prior citation works have analyzed a wide variety 
of factors such as the distribution of citation rates f[13l Il5l [2011 . 



A stretched exponential fitting was applied for modeling 
citation distributions based on multiplicative processes Hill . 
Lehmann 11511 attempted to fit both a power law and stretched 
exponential to the citation distribution of 281717 papers in 
the SPIRES database and showed it is impossible to discrim- 
inate between the two models. Redner analyzed the ISI and 
Physical Review databases lfl3ll . In Redner's work the ap- 
plied fitting distribution had only partial success while the 
same numerical data for large citation count c showed that 
it can be fitted quite satisfactorily with a single curve by us- 
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ing nonextensive thermostatistical formalism [20]. Another 
fitting distribution that was applied was the lognormal distri- 
bution, which was used in order to measure the research ac- 
tivity [10]. A recent characterization of scientific impact has 
been conducted using Tsallis ^-exponential distribution [7[. 
In that work the scientific research activity was considered in 
terms of the number of publications and number of citations 
using data from Thomson ISI Web of Science database [3] 
for many different countries from Latin America, Europe and 
South Africa. That study showed that the data for all the tested 
countries can be satisfactorily fitted with a single curve, which 
naturally emerges within the Tsallis theory 112 ill . 

In this work further study has been done for the Brazilian 
scientific community. Traditionally, researchers and institu- 
tions have been evaluated by peer review, which is the main 
mechanism for merit assessment for funding, appointment, 
and promotion decisions. There is also currently a global 
trend towards developing and broadeningthe use of biblio- 
metric indicators to help these decisions [11]. The experimen- 
tal data shows that each year there is an increase in Brazilian 
contribution to international science (this is obtained by the 
total number of publications). The number of Brazilian au- 
thors and the number of Brazilian publications in the interna- 
tional scientific literature has grown substantially during the 
last decades [3]. Many studies have been done to analyze the 
Brazilian scientific activity further and also provide a perfor- 
mance metric for the Brazilian Institutions fm [l7l [l8tl 

This manuscript provides an analysis of the scientific ci- 
tations of the Brazilians institutions and their impact within 

a nonextensive thermostatistical formalism, the Tsallis q- 

i 

exponential distribution N(c), N(c) °= 1/[1 + (q — 1) c/T] ^ 
with q ~ 4/3 for all the available citations c, T being an "ef- 
fective temperature". Emphasis is also given on the perfor- 
mance of the Brazilian Institutions of Physics and Physics 
departments of Brazil's universities. The outputs of this 
study could be useful for the national Brazilian agencies, 
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such as CAPES (Coordenadoria de Aperfeioamento de Pes- 
soal de Nivel Superior) and other research support agencies, 
which are responsible for creating and assessing programs and 
projects. Finally, the "effective temperature" will be a scien- 
tific metric for the Brazilian sciences' growing performance 
and will help Brazilian agencies in the evaluation process of 
the research programs. 

2. NONEXTENSIVE STATISTICAL MECHANICS AND 
TSALLIS ^-EXPONENTIAL DISTRIBUTION 

Nowadays, the idea of nonextensivity has been used in 
many applications. Nonextensive statistical mechanics has 
been applied successfully in physics (astrophysics, astron- 
omy, cosmology, nonlinear dynamics) II 1911 . biolo gy 02511 . eco- 
nomics [24], human and computer sciences JESI221 and pro- 
vide interesting insights into a variety of physical systems, and 
among others B28I1 ). 

Nonextensive statistical mechanics is based on Tsallis en- 
tropy. Tsallis statistics 112 ill is currently considered useful 
in describing the thermostatistical properties of nonextensive 
systems; it is based on the generalized entropic form [23]: 

S q = k — fog*), (i) 

q i 

where W is the total number of microscopic configurations, 
whose probabilities are {/>;}, and A: is a conventional pos- 
itive constant. When q = 1 it reproduces the Boltzmann- 
Gibbs entropic form Sbg = — kJ^=iPd n Pi- The nonexten- 
sive entropy S q achieves its extreme value at the equiprob- 

ability pi = l/W,Vi, and this value equals Sq = fc w 1 _ / ~ 1 
(Si = Sbg = klnW) ll23l l26ll . The Tsallis entropy is nonad- 
ditive in such a way that, for statistical independent systems A 
and B, the entropy satisfies the following property: 

S q (A+B) _S q (A) | S q (B) | ^ S q (A)S q (B) ^ 

Jc k k k k 

It is subadditive for q > 1, superadditive for q < 1, and, for 
q = 1, it recovers the BG entropy, which is additive l26ll . 
The Boltzmann factor is generalized into a power-law. The 
mathematical basis for Tsallis statistics includes ^-generalized 
expressions for the logarithm and the exponential functions 
which are the g-logarithm and the ^-exponential functions. 
The q-exponential function, which reduces to exp{x) in the 
limit q — > 1, is defined as follows 

< = [1 + (1 -qfltt = \ -T- (e\ = e*) . (3) 

[l-(q-l)x]Ta=V 

We remind that extremizing entropy S q under appropriate con- 
straints we obtain a probability distribution, which is propor- 
tional to q-exponential function. 

In this work, we focus on the analysis of the distribution 
of citations of scientific publications, more precisely those 
that have been catalogued by the Institute for Scientific Infor- 
mation (ISI) for the Brazilian Institutions and for the whole 



of Brazil. The proposed fitting distributions follow from the 

i 

nonextensive formalism as N(x) °< 1/[1 + (q — 1 ) c/T] i 111 . In 
this study we adopt the following expression: 

W(c) =iV(2) exp^ (4) 

where N(2) is the number of papers with two citations, and, 
as already mentioned, T plays the role of an effective temper- 
ature. 



3. THOMSON ISI WEB OF KNOWLEDGE- DATA 
ACQUISITION 

Traditionally, the most commonly used source of biblio- 
metric data is Thomson ISI Web of Knowledge, in particu- 
lar the (Social) Science Citation Index and the Journal Ci- 
tation Reports (JCR), which provide the yearly Journal Im- 
pact Factors (JIF) [3]. The subject categories and terminology 
provided by ISI are widely recognized by many researchers 
and scientometricians in their studies and are relatively sim- 
ple to use Q7k lLOD - The Institute for Scientific Information has 
made an industry of providing citation data to libraries since 
the mid-1960s; the products are currently available as part of 
Thomson/ISI. Although the ISI database has a few shortcom- 
ings, overall it gives a wide coverage of most research fields 
fl27ll . Therefore in our survey we utilize Thomson/ISI Web 
of Science database to study the distribution of the citation 
within a variety of countries. 

To obtain all the necessary data we developed a program 
which automatically downloads the ISI bibliographic infor- 
mation. We take into account all the document types, e.g. 
articles, proceeding papers, meeting abstracts, etc, for all the 
available subject areas, for instance neurosciences, mathemat- 
ics, chemistry etc, to select all the data for the Brazil and then 
the same procedure for the Brazilian Institutions and the de- 
partments and institutes of physics that we are interested in. 

The program was written in Delphi 7 and uses the TWeb- 
Browser component. This component provides access to the 
Web browser functionality and saves all the "html" pages. 
When the page was completely downloaded, an OnDownload- 
Complet event was generated and we went automatically to 
the next "html" page. When all pages were downloaded we 
processed each "html" page to obtain the specific information 
that we were interested in using the TPerlRegEx component 
from the open source PCRE library [http://www.pcre.org/!. In 
this case, we gathered the number of citations for each publi- 
cation and the total number of the published papers, for each 
Institution. We applied filters to take all these data sorted by 
the times cited, using the Citation databases namely Science 
Citation Index Expanded (SCI - EXPANDED 1945-present), 
Social Sciences Citation Index (SSCI 1956-present, and Arts 
and humanities Citation Index (AandHCI 1975-present). All 
the data was captured during December 2008. 
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FIG. 1 : Process to obtain best q and T (up). 

Probability distribution for citations of Brazil country (down) 



4. PRESENTATION OF RESULTS 

Firstly, we are going to present the data for the whole of 
Brazil captured until December 2008 and then describe the 
procedure that we follow to conduct the final citations fitting. 
All the papers included in the Web of Science and having at 
least one author with at least one affiliation address in Brazil 
have been collected. This means that the work includes all the 
documents with at least one Brazilian address with citations 
till December 2008. Research done by Brazilians abroad, i.e 
with only foreign addresses, is disregarded in the considered 
database. Note that the data and results are presented on a 
log-log scale. Initially we evaluate the values of q in order to 
find its optimal value, and then, with this value, we move to 
the final fitting in order to determine T. The corresponding 
angle gives the optimal value of the effective temperature T 
(Figure [TJ. With these two values (q and T) we present the 
fitting in a log-log diagram. In the Brazil case a remarkably 
good fitting can be done with q = 1.339 and T = 4.0. This 
temperature provides good evidence about the impact of the 
published papers, and enables a ranking. Figure Q] illustrates 
the entire process. 

Next we investigate how the temperature changes during 
the years. As the temperature is a characterization of the sci- 
entific impact its evolution over the years can offer a deeper 
understanding of how the Brazilian research activity evolved. 
Figure 2 presents the temperature for each period that we 
study, for instance between 1945 to 1985, then during the pe- 
riod 1986-1990, 1991- 1995 and so on. This histogram high- 
lights how the scientific research activity changes with time. 
It is remarkable how effective temperature is as a reliable per- 



TABLE I: Number of total publications, and the percentage of zero, 
and one cited papers for the tested Institutions 
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Lc=0 ly ( c ) 


N(0) 


(%) 
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USP 


66404 


24197 


(36.4%) 


7 041 ( 10.6%) 


UNICAMP 


24 209 


8215 


(33.9%) 


2771 (11.5%) 


UFRJ 


21656 


7498 


(34.6%) 


2 591 (12.0%) 


UFPE 


6032 


2067 


(34.3%) 


794 (13.2%) 


UFRGS 


5 540 


2 868 


(51.8%) 


695 (12.5%) 


UFF 


5318 


1919 


(36.1%) 


668 (12.6%) 


UFMG 


1887 


680 


(36.0%) 


286 (15.2%) 


Brazil 


285 570 


108 984 


(38.2%) 


33428 (11.7%) 



formance metric for the research activity of Brazil. This part 
of the analysis uses the entire available-year publication win- 
dow for all disciplines for papers published between 1945 to 
December 2008. Note that for the last periods from 2001 to 
2004 and 2005 to 2008 there has not been enough time for the 
publications to become widely known to the scientific com- 
munity so the number of their citations is small. Thus the 
overall temperature is smaller as there is this delay. Also Fig- 
ure 2 (right) illustrates the performance of Brazil in Physics 
domain. 39617 papers (8 688 zero citations, (21.9%)) are 
published in Physics until January 2009 giving T=4.44, which 
characterizes the overall research performance of our tested 
Brazilian society of Physics. 

Note that the results for "Brazil" do not represent the av- 
erage of the particular Brazilian institutions that we are con- 
sidering in the Tables but all the Brazilian institutions. This 
happens because these results are taken by placing "Brazil" 
in the address field. It should also be clear that when we re- 
fer to "Brazil Physics", it is the average research performance 
for all the Brazilian institutions in the area of physics and not 
only the tested Brazilian institutes, i.e in this case we apply 
the word "Brazil", and Physics ("Fis") in the address field to 
obtain these results. Finally, in the tables HTl and [TTT1 we study 
the institutions with temperature greater or equal to the whole 
Brazilian temperature, i.e T > 4.0 . 

Table Upresents the total number publications, and the per- 
centage of zero, and once cited papers for the tested Brazil- 
ian Institutions. University of Sao Paulo (USP) achieves the 
highest publication productivity with 66 404 published papers. 
Then University Estadual Campinas (UNICAMP) and Fed- 
eral University of Rio de Janeiro (UFRJ) publish 24 209 and 
21656 research papers respectively. The rest of the tested 
Brazilian Institutions attain a significantly lower rate of pub- 
lished papers, i.e Federal University of Pernambuco (UFPE), 
Federal University of Rio Grande do Sul (UFRGS), and Fed- 
eral University of Fluminense (UFF) have published 6 032 , 
5 540, 5 318 papers respectively. Finally, the Federal Univer- 
sity of Minas Gerais (UFMG) have 1 887 publications. 

Next, Table [TT] presents the Brazilian Institutions in the 
ranking based on the temperature that we obtain through the 
nonextensive distribution fitting. Notice that this ranking dif- 
fers from the one presented in Table UJ where the total amount 
of the published papers (quantity ranking) is shown. The ef- 
fective temperature T characterizes the scientific impact of the 
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Brazilian Temperature Evolution 
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Total # of papers - 39617 
N|0)= 8688 (21.9%) 
N|l)= 4967 (12.5%) 
q - 1.332 
R 2 - 0.99 
T - 4 . 44 




FIG. 2: Brazilian evolution of the effective temperature within the tested period (left) and the Probability distribution for citations of Brazil in 
Physics (right) 



tested Institutions. As we can perceive from Table ITR in al- 
most all cases the range value of the entropic index q is around 
q = 4/3. The linear regression coefficient R 2 is also indi- 
cated in each case. As we can see comparing Tables U and 
Ull the rankings are quite different. Let us check UFRJ, for in- 
stance. Although it has a relatively smaller number of papers 
compared to UNICAMP, its effective temperature is higher 
T = 4.55. 

Table [Til] presents the best fitting values of q and the ef- 
fective temperature T, which characterizes the research im- 
pact of the Brazilian Institutions with emphasis on Physics. In 
this analysis UFMG was not included as the available publica- 
tions in the Web of Science database y|] are few (not enough 
to have a good statistical analysis). In this survey the Cen- 
tra Brasileiro de Pesquisas Fisicas (CBPF) is also included . 
It becomes evident from the Table [III] that CBPF, USP and 
UNICAMP achieved the highest temperature in research ac- 
tivity in Physics by applying the new metric (T). It is also 
worth mentioning that the Institutes/Departments of Physics 
of the Universities have the responsibility of both undergrad- 
uate/graduate students and are administratively located at the 
Ministry of Education, whereas CBPF has the responsibility 
of only graduate students and is administratively located at the 
Ministry of Science and Technology. This is possibly one of 
the reasons that can help this institute to achieve higher tem- 
perature. Moreover it is important to mention at this point 
the performance of the UFPE and UFRJ. While the UFPE in- 
creases significantly its Temperature on the domain of Physics 
the UFRJ has lower T — 4. 10 compared to the overall research 
impact value (T — 4.55) in all sciences. 

Figures [3] and [4] illustrate the fitting of different Brazilian 
Institutions using the nonextensive distribution N(c). Figure 
[3] left side shows publications of all sciences and right side 
demonstrates the research activity in physics domain. As we 
can observe the general tendency for physics science have a 
higher research impact than the overall university activity. Fi- 
nally, Figure[4]presents the CBPF and UFPE fitting curves by 
applying the new characterization of citations impact. CBPF 
achieves the highest performance with T=5.32 and g=1.336. 
UFPE physics domain attains T=4.76 while the whole UFPE's 
university citations impact metric is 4.08. 

From all the above experimental results, we obtain a value 
of q close to 4/3. 



5. CONCLUSIONS 

Nowadays the number of citations is among the most 
widely used measures of academic performance. Extended 
study of citation distributions helps to understand better the 
mechanics behind citations and can objectively establish a 
comparative measure for scientific performance. Citations of 
scientific papers constitute in fact a connection network con- 
sisting of authors (nodes) and directed links (citations) among 
them. Recently, connection networks have been described, 
studied, characterized and represented by parameters using 
typical concepts in the area of Complex Systems. 

The entropic index q in Tsallis entropy is usually inter- 
preted as a quantity characterizing the degree of nonexten- 
sivity of a system. An appropriate choice of the entropic in- 
dex q to nonextensive physical systems still remains an open 
field of study. In some cases, the physical meaning of the in- 
dex q is unknown; it provides nevertheless new possibilities of 
comparison between theoretical approaches and experimental 
data. Other cases are better understood, and then q has a clear 
physical meaning, either at a microscopic or at a mesoscopic 
level, or both. 

In this paper we characterize the citations impact of the 
Brazilian institutions using the Tsallis q-exponential distri- 
bution. We also show how the scientific research activity 
changes with time, between six periods from 1945 to 2008. 
The present study provides a new performance metric based 
on Nonextensive Statistical Mechanics for ranking and evalu- 
ating institutions' research production. The proposed Tsallis 
q-exponential distribution satisfactorily describes Institute of 
Scientific Information citations for Brazilian institutions and 
Brazilian physics departments between 1945 and December 
2008. 

Our study provides evidence that the citation distribution 
for all tested cases within this period could be the Tsallis q- 
exponential distribution. Our findings in this work gives an 
evidence for the effectiveness of T, and the ranking that we 
proposed based on the Temperature. Figure [5] illustrates the 
g-logarithmic number of publications ln q [N(c)/N(l)] versus 
the (c — 1 ) number of citations for three different Brazilian 
universities (UFF, UNICAMP, USP). USP has the higher cita- 
tion impact, the UNICAMP an intermediate T and UFF lower 
temperature than the average. It is important to notice that 
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TABLE II: Best fitting values of q and effective temperature! 1 . Note that tested Institutions are ranked according to T 



Institutions 


Entropic index 

1 


Linear regression 
coefficient R 2 


Temperature 

T 


USP 


1.339 


0.99 


4.75 


UFRJ 


1.300 


0.99 


4.55 


UNICAMP 


1.330 


0.99 


4.35 


UFPE 


1.336 


0.99 


4.08 


UFF 


1.335 


0.99 


4.00 


Brazil 


1.339 


0.99 


4.00 



TABLE III: Best fitting values of q and effective temperature J 1 . Note that tested Institutions are ranked according to T 
Total # Papers # Zero citations Entropic index Linear regression Temperature 



Physics L7=qN(c) N(0) (%) q coefficient R 2 T_ 



CBPF 


3 680 


658 (17.9%) 


1.336 


0.99 


5.32 


USP 


8781 


1776 (20.2%) 


1.320 


0.99 


5.13 


UNICAMP 


3 992 


809 (20.3%) 


1.330 


0.99 


5.0 


UFPE 


1685 


311 (18.5%) 


1.336 


0.99 


4.76 


UFRJ 


5 089 


1646 (32.3%) 


1.336 


0.99 


4.10 


UFF 


1512 


309 (20.4%) 


1.332 


0.99 


4.08 


Brazil Physics 


39 617 


8 688 (21.9%) 


1.332 


0.99 


4.44 



(—1/7') corresponds to the average slope associated with each 
university. It also gives an explanation for the meaning of T, 
and the ranking that we proposed based on the new perfor- 
mance metric T. 

It is remarkable how the proposed nonextensive distribution 
fits satisfactorily all cited papers for all the institutions. This 
part of the analysis uses the entire available-year publication 
window for all disciplines for papers published between 1945 
to December 2008. The present article also focuses on the 
performance of the Brazilian Institutions and their activities in 
physics science. In the present study we used a single database 
for the extraction of the articles, and their number of citations. 
The ISI/Web of Science was chosen because it is one of the 
main databases providing information on citations. Although 
our strategy might have left publications out of the analysis, 
we believe that the sample of articles was representative of the 
core international scientific production of the Brazilian Insti- 
tutions. The new performance metric of citations impact is a 
balanced combination of "quantity" (number of publications) 
and "quality" (number of citations). These are the main fac- 
tors of this performance metric. Keeping in mind that citation 
rate reflects the use and impact of scientific information, not 
necessarily expressing quality. 

This work intends to show how the new methodology can 



be used to analyze and compare institutions within a given 
country. A case study of certain Brazilian institutions and 
their physics departments is used to investigate the effective- 
ness of the new characterization of citations impact. Future 
work can address other scientific fields in these important 
Brazilian universities or universities of other countries and 
how they evolved observing the same analyzed period of time. 
It is also important to study cases of universities, countries 
or other scientific institutions with extremely high number of 
zero or one citations and observe the impact of their research 
activity. The extent to which this number of citations affects 
the proposed performance metric will be a field of further 
study. 
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